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Non-Deterministic Finite Cover Automata! 


Cezar CAMPEANU? 


Abstract 


The concept of Deterministic Finite Cover Automata (DFCA) was 
introduced at WIA ’98, as a more compact representation than De- 
terministic Finite Automata (DFA) for finite languages. In some 
cases representing a finite language using a Non-deterministic Finite 
Automata (NFA) may significantly reduce the number of required 
states. The combined power of the succinctness of the representation 
of finite languages using both cover languages and non-determinism 
has been suggested, but never systematically studied. In the present 
paper, for non-deterministic finite cover automata (NFCA) and /-non- 
deterministic finite cover automaton (I-NFCA), we show that mini- 
mization can be as hard as minimizing NFAs for regular languages, 
even in the case of NFCAs using unary alphabets. Moreover, we show 
how we can adapt the methods used to reduce, or minimize the size of 
NFAs/DFCAs/I-DFCAs, for simplifying NFCAs/I-NFCAs. 


Keywords: Regular languages, finite languages, cover automata, J- 
cover automata, similarity relation 


1 Introduction 


The race to find more compact representation for finite languages was started 
in 1959, when Michael O. Rabin and Dana Scott introduced the notion of Non- 
deterministic Finite Automata, and showed that the equivalent Deterministic 
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Finite Automaton can be, in terms of number of states, exponential larger 
than the NFA. Since, it was proved in [28] that we can obtain a polynomial 
algorithm for minimizing DFAs, and in [19] was proved that an O(n log n) 
algorithm exists. In the meantime, several heuristic approaches have been 
proposed to reduce the size of NFAs [2, 21], and it was proved by Jiang 
and Ravikumar [22] that NFA minimization problems are hard; even in 
case of regular languages over a one letter alphabet, the minimization is 
NP-complete [13, 22]. 


On the other hand, in case of finite languages, we can obtain minimizing 
algorithms [25, 29] that are in the order of O(n), where n is the number 
of states of the original DFA. In [6, 8, 24] it has been shown that using 
Deterministic Finite Cover Automata to represent finite languages, we 
have minimization algorithms as efficient as the best known algorithm for 
minimizing DFAs for regular languages. 


The study of the state complexity of operations on regular languages 
was initiated by Maslov in 1970 [25, 26], but has not become a subject of 
systematic study until 1994 [31]. The special case of state complexity of 
operations on finite languages was studied in [7]. 


Non-deterministic state complexity of regular languages was also subject 
of interest, for example in [15, 16, 17, 18]. To find lower bounds for the non- 
deterministic state complexity of regular languages, the fooling set technique, 
or the extended fooling set technique may be used [3, 11, 13]. 


In this paper we show that NFCA state complexity for a finite language 
L can be exponentially lower than NFA or DFCA state complexity of the 
same language. We modify the fooling set technique for cover automata, to 
help us prove lower bounds for NFCA state complexity in Section 3. We also 
show that the (extended) fooling set technique is not optimal, as we have 
minimal NFCAs with arbitrary number of states, and the largest fooling set 
has constant size, Theorem 4. In Section 4 we show that minimizing NFCAs 
is hard, and in Section 5 we show that heuristic approaches for minimizing 
DFAs or NFAs need a special treatment when applied to NFCAs, as many 
results valid for the DFCAs are no longer true for NFCAs. We show a 
connection between fooling sets and NFA state reduction in Section 6. In 
section 7, we formulate a few open problems and future research directions. 
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2 Notations and Definitions 


The number of elements of a set JT is denoted by #7. In case © is an 
alphabet, i.e., finite non-empty set, the free monoid generated by % is &*, 
and it is the set of all words over }; the empty words, i.e., the word with 
no letters, is denoted by ¢. The length of a word w = w w2...wn, n > 0, 
wi EX, 1<i<n, is |w| =n, in particular |e| = 0 (forn = 0, w =e). The 
set of words of length equal to | is S!, the set of words of length less than or 
equal to | is denoted by S<!. In a similar fashion, we define 52!, ©<!, or U>!. 
A finite automaton is a structure A = (Q,%,06,q0, Ff’), where Q is a finite 
non-empty set called the set of states, / is an alphabet, qo € Q, F C Q is 
the set of final states, and 6 is the transition function. For the function 6, 
we distinguish the following cases: 


e if6:QxX—>+Q, the automaton is deterministic; in case 6 is always 
defined, the automaton is complete, otherwise it is incomplete; 


e ifd:Qx XY —> 22, the automaton is non-deterministic. 


The language accepted by an automaton is defined by: L(A) = {w € X* | 
d({qo},w) AF £0}, where 6(S,w) is defined as follows: 


5(S,¢e) =S, 


6(S,wa)= (J d({a},a). 


qeo(S,w) 


Of course, 6({q},a) = {6(q,a)} in case the automaton is deterministic, and 
d({q}, a) = 6(q,a), in case the automaton is non-deterministic. 


Definition 1 Let L be a finite language, andl be the length of the longest 
word w in L, i.e., |= max{|w| | w € L}°. If L is a finite language, L' is a 
cover language for L if L' ANUS! = L. 

A cover automaton for a finite language L is an automaton that recog- 


nizes a cover language, L', for L. Anl-NFCA A is a cover automaton for 
the language L(A) Nd". 


One could obviously see that any automaton that recognizes L is also a 
cover automaton. 


3 We use the convention that max @ = 0. 
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The level of a state s € Q in a cover automaton A = (Q,™, 06, qo, F’) is 
the length of the shortest word that can reach the state s, i.e., level ,(s) = 
min{|w| | s € 6(qgo, w)}. 

Let us denote by 2,4(s) the smallest word w, according to quasi- 
lexicographical order, such that s € 6(qo, w), see [8] for a similar definition 
in case of DFCA. Obviously, level(s) = |x 4(s)|. 

For a regular language L, =; denotes the Myhill-Nerode equivalence of 
words [20, 30]. 

The similarity relation induced by a finite language L is defined as 
follows[8]: x ~~, y, if for all w € NS maxtieblyl} ow e€ L iff yw e L. A 
dissimilar sequence for a finite language L is a sequence 21,..., np, such that 
a fr ey for alll <4.9 <n andt+ 3. 

Now, we need to define the similarity for states in an NFCA, since it 
was the main notion used for DFCA minimization. 


Definition 2 In an NFCA A = (Q,%,6,q0,F'), two states p,q € Q are 
similar, written p ~, q, if O(p,w) NF £4 O aff 6(¢,w) NF # O, for all 
we yisl-max{level(p),level(q)} | 


In all cases when the automaton A is understood, we may omit the 
subscript A, i.e., we write p ~ q instead of p ~y q, also we can write level(p) 
instead of level 4(p). 

We consider only non-trivial NFCAs for L, i.e., NFCAs such that 
level(p) <1 for all states p. In case level(p) > 1, p can be eliminated, and 
the resulting NFA is still an NFCA for L. 

In case level(p) < 1, level(q) < 1, and p ~ q, then either p,q € F, or 
p,q € Q \ F, because |e] < 1 — max{level(p), level(q)}. 

Deterministic state complexity of a regular language L is defined as the 
number of states of the minimal deterministic automaton recognizing L, and 
it is denoted by sc(L): 


sc(L) = min{#Q | A= (Q,»,6,qo, F), is deterministic, complete, 
and L = L(A)}. 
Non-deterministic state complexity of a regular language L is defined as 


the number of states of the minimal non-deterministic automaton recognizing 
L, and it is denoted by nsc(L): 


nsc(L) = minf#Q | A = (Q, 5,46, qo, F), non-deterministic and L = L(A)}. 
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For finite languages L, we can also define deterministic cover state complexity 
csc(L) and non-deterministic cover state complexity ncsc(L): 


esc(L) = min{#Q|A=(Q,»,46,qo, F), deterministic, complete, and 
La T(A n=}, 
nesc(L) = min{#Q | A= (Q,»,0,q0,F), non-deterministic, and 
L=L(A)n>$4}. 


Obviously, ncesc(L) < nsc(L) < sc(L), but also nesc(L) < csc(L) < 
sc(L). Thus, non-deterministic finite cover automata can be considered to 
be one of the most compact representation of finite languages. 


3  Lower-Bounds and Compression Ratio for 
NFCAs 


We start this section analyzing few examples where non-determinism, or the 
use of a cover language, reduce the state complexity. Let us first analyze the 
type of languages where non-determinism, combined with cover properties, 
significantly reduce the state complexity. 

We choose the language Lp,,,, = {a,b}S™a{a, b}"~?, where m,n EN. 
In Figure 1, we can see an NFA recognizing Lp, with m+n states. Please 
note that the longest word in the language has m +n — 1 letters. 


Figure 1: An NFA with m+ n states for the language Dr,,,, 
{a,b} ala,b\" =. 


Let us analyze if the automaton in Figure 1 is minimal. The fooling set 
technique, introduced in [10] and [12] and used to prove the lower-bound for 
state complexity of NFAs, is stated in [3, 10] as follows: 
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Lemma 1 Let L C &* be a regular language, and suppose there exists a set 
of pairs S = {(2;,y;) | 1<%<n} with the following properties: 


Llp eae Cl, for a4 hone ay LD, for a Ls 4, 7 yt 5; 
then nsc(L) >n. The set S is called a fooling set for L. 


2. If xy; € L, for 1 <i<n and for 1 <i,j <n, ifi F j, implies 
either xy; ¢ L or «jy; € L, then nsc(L) > n. The set S is called an 
extended fooling set for L. 


Now, consider the language Lr,,,, and following set of pairs of words, 
S = S1 US. = {(xp, yx) | 1 < k < m+n}, where 91 = {(b™ab),b"-2-4) | 
0<j<n-—2} and So ={(a',b™‘*ab”-?) |0<i< my}. 

For (%, yx) € S, we have that 


1 eee = Oa 8 =O" ab 2S Las. OF 
22. yeaa 0" Oe Se, gs 


Let us examine for each 1 <k,h << m+n, k #h if the words xzyp, and 
LpYe are also in L. We have the following possibilities: 


1. Case I 
(2h, Yn) = (O™ab*, b”-2*) € S; and (xp, y,) = (Bab), b"-2-4) € Sy 
(a) reyn = b™ab'o”- 2-5 ¢ LFam and 
(b) Cave = b™ abi pr—2-% ¢ 5; ane 
2. Case II 
(2,4) = (al, ab") € Sp and (en, yn) = (a, B"-Iab”2) € 8 
(a) xeyn = atb™ Jab € Lan» if i <j, but 
(b) tpy_ = ab” ab" € Lp,,,,, if i < j (because |a/b™ ab"? | = 
m+n—-1+j-i>m+n-1). 
3. Case III 
(rk, Yk) = (b™ ab), b"-2-5) € S) and (xp, yn) = (a*, b™-*ab"-2) € So 


(a) teyn = B™abib™~*ab"-? ¢ Lp, (because |b™ab)b™~*ab"~?| = 
mt+1l+jtm—i+l+n-2>m+n-l1). 
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From the statement 2. of Lemma 1, it follows that the NFA is minimal. 
We must note the following: 


1. we cannot use the weak form 1 to prove the lower-bound; 


2. when proving the lower-bound, we concatenate words to obtain a word 
of length greater than the maximum length of the words in the language, 
and that’s why x;y; is rejected. Since in case of cover automata such 
words will be automatically rejected, there is no doubt that any fooling 
set type technique we may use to prove the lower-bound for NFCAs 
must consider the length, ignoring the cases when the length exceeds 
the maximal one. 


Hence, the fooling set technique introduced in [10] and [12], and used 
to prove the lower-bound for state complexity of NFAs, can be modified to 
prove a lower-bound for minimal NFCAs, and it can be formulated for cover 
languages as an adaptation of Theorem 1 in [13]. 


Lemma 2 Let L C US! be a finite language such that the longest word in L 
has the length 1, and suppose there exists a set of pairs S = {(xj, yi) | Viys € 
L,1<i<n}, with the following properties: 


1. For alli,j,1<1,j7 <n, such that xy; € wifi AJ, we have that 
«iy; € L. Then we have that nesc(L) > n. 
The set S is called a fooling set for L. 

2. For alli,j, 1< i,j <n, ffi AG, we have either xy; € vS! and 
ci EL, oF nye yo and xjyi ¢ L. Then we have that nesc(L) > n. 
The set S is called an extended fooling set for L. 


Proof: Assume there exists an NFCA A = (Q,», 6, qo, F'), with m, states 
accepting L and m <n. For each i, 1 <i<n, x;y; € L, therefore we must 
have a state s; € 6(qo,2;) and 6(s;, y;) F 4 0. In other words, there exists 
a state f; € F and f; € 6(s;, y). 


1. We claim s; ¢ 5(q0,27;) for all 7 A i, thus m > n. If s; € 5(qo, 24), 
then fj € 6(si, yi) C 6(qo, x;y), and because |x;y;| < l, it follows that 
«jy, € L, a contradiction. 


2. We consider the function f : {1,...,2} —> Q defined by f(z) = 5;, 5; 
as above. We claim that f is injective, thus m > n. If f(t) = f(y), 
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then 4(f(i), ys) = O(F(). 4), also 5(F(j), 49) = 5F(@),44)- Because 
O(f (2), yi) AF 4 0, we also have that 6(f(j), y;) OF #0, and because 
|xiy;| < l, it follows that x;y; € L, a contradiction. If |x;y;| < 1, using 
the same reasoning, will follow that x;y; € L. 


In both cases we have a contradiction, thus Q must have at least n elements. 


For the example above, we discover that we cannot have more than 
one pair of the form (a’,b™‘ab”~), thus, applying the extended fooling set 
technique for NFCAs, the minimum number of states in a minimal NFCA is 
at least n -2+1+1=n. This proves that the NFCA presented in Figure 2 
is minimal. 


a,b 
‘ a sy eae @) es ae a Gs i 


Figure 2: An NFCA with n states for the language Lp, = 
{a,b}<™af{a,b}"~, that is the same as the one in Figure 1. In case m = 2 
and n = 4, the language is the same as the one described in Figure 3. 

An equivalent minimal NFA has m+n states. 


It is easy to check that any two distinct words w1, w2 € US""!, w; 4 wa, 
are not similar with respect to ~z. It follows that for the language presented 
in Figure 1, esc(L) > 2"~!. One can also verify that for two distinct words 
uay and waz, if |y| 4 |a|, |z|,|y| <n — 2, they are distinguishable; also, in 
case |a| = |y| <n —2, the word a”—?—|#I will distinguish between all the 
words for which |u| <n — 2 — |2| or |w| < n — 2 — |x|, thus the number of 
states in the minimal DFA is even larger than csc(L). In case m = 2 and 
n = 4, the minimal DFCA is presented in Figure 3. 

A simple computation shows us that the corresponding minimal DFA 
for Lp, , has 15 states. 

For helping the reader to better understand the compression power of 
NFCA over NFAs, DFCAs, or DFAs, we present corresponding automata 
for a smaller example, i.e., for the language Lp, ,. In this case, a minimal 
NFCA presented in Figure 4 has 3 states, a minimal NFA in Figure 5 has 4 
states, a minimal DFCA in Figure 7 has also 4 states, and the minimal DFA 
in Figure 6 has 11 states. 
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3 


a 


Figure 3: A minimal DFCA with 8 states for the language Lp, = 
{a, b}$<afa, b}?, |= 5 and the equivalent minimal DFA has 15 states. 


a,b 


Figure 4: An NFCA with n = 3 states for the language Lp, = 
{a, b}<?afa, b} 


a,b 


~<———_| — ? |j~— 


a 


for o= oO) 


Figure 5: An NFA with 2+ 3 = 5 states for the language Lp,, = 
{a, b}<?afa, b} 


We can observe that we do have the following similarities in Figure 6: 
7~3,8~4,9~1, 10~ 0, thus we can obtain the corresponding minimal 
DFCA in Figure 7. 

These language examples show that NFCAs may be a much more 
compact representation for finite languages than NFAs, or even DFCAs, and 
motivates the study of such objects. In terms of compression, clearly the 
number of states in the NFCA is exponentially smaller than the number of 
states in the DFA, and in some cases, even exponentially smaller than in an 
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Figure 6: The minimal DFA with 11 states for the language Lp, = 
{a, b}<a{a, b} 


Figure 7: A minimal DFCA with 4 states for the language Lp, = 
{a, b}afa, b}, [=4 


NFA. 
Let’s set & = {a}, 1 > k > 2, and choose the following language: 


Lx, = a(Z! — {(a")" | n> 0}). (1) 


In Figure 8, the NFA Ax, accepts the language Ly, = a(=* — {(a*)” | 
n > O}) = {aaa’ | i £ k— 1modk}, which is a cover language for Lx,,. In 
other words, A; is an NFCA for Lx,,,, therefore ncsc(Lx,,,) < esc(Lx,,) < 
sc(Lx,,) <mind+1,k+1) =k +1. 

It is known [10, 16, 27] that the automaton Ax, is minimal NFA for 
Lx, = U Lx,,,, if k is a prime number. However, this may not be a 


lEN,I>k 
minimal NFCA, as illustrated by the example in Figure 9, where Ax, is not 


a minimal NFCA for Lx, ,, even if it is minimal NFA for the cover language 
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Figure 8: An NFA/NFCA Aj, for L;,. In this particular case, A; is also the 
minimal DFA. 


Figure 9: A minimal NFCA for Lx, ,, left, and a minimal NFA, Ax,, for a 
cover language, right. 


We apply the extended fooling set technique for the language Lx, ,. 
Because the alphabet is unary, all the words in an extended fooling 
set S are powers of a. Thus, considering only pairs in the fooling set 
S, such that the first word is not ¢€ we have that for some r € N: 
SD {(a", aI), (a, a2), (a, a?3),..., (a, a") }, and 1 < iy,...,i <k. 

We show that r cannot be greater than 3, thus S has at most 4 elements. 
It is enough to take four pairs (a’!,a/'),(a’?, a?),(a’8, a3), (a’4,a/4), and 
show that they cannot have the extended fooling set property. We have 
agi ¢ Lge 55 OF aa ¢ Lx,,, and aa) ¢ Diss f5.08 asa ¢ Lx,,, and 
aalt ¢ Ly, ,, or ata ¢ L X,,- Without any loss of generality, we may 
assume that 71 +jo = zak +1 and 714+ 73 = 213k+1, all the other cases being 
similar, as they are just permutations of indexes, or replacing 2’s by j’s. If 
71, 22,73 > 1, and 24+ 32 = zy9k+1 and 71 +j3 = z13k+1 for some z12, 213 € N, 
then 72 + jg # zogk +1 and i3 + jo A 232k +1, for any 293, 232 € N, because 
i2+ 93 = zo93k+1 would imply 71 + jo +22 493 = tk +2, © = 212 + 223, which 
means jg +79 = tk +2 — 243k —1=yk +1, for y = x — 213, a contradiction, 


“Please note that for any finite language, there are infinitely many cover languages, as 
in Definition 1. 
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and 73 + jg = 232k + 1 would imply i3 + j2 + 41 + jg = 2k +2, © = 213 + 232, 
ie., 23 +93 = vk +2—-— 29k —1=yk +1, for y = x — 212, which is also not 
possible. 

It follows that r < 3, thus any extended fooling set for Lx,, has at 
most 4 elements. 

Let A be an NFA accepting L > Lyx,, and we can consider that it is 
already in Chrobak normal form, as it is ultimately periodic. Thus, for each 
L, nsc(L) > pi +...+ps, where p; are primes, and each cycle has pr states, 
1 <i<_s. Now, let us prove that for k prime, Ax, is minimal for some 
language Lyx, ,,1>k. 

Assume there exists an automaton B = (Qz,%, 46g, 90,8, FB) with m 
states, m < k+1 such that L(B) = Lx,,. It follows that the language 
L(B) will contain words with a length x + hy for x,y < k, and allh EN. 
For h large enough, one of these words will be of length multiple of k plus 
1, because k& is prime, therefore, for large enough J, i.e., greater than some 
lok, Lx, # L(B), because av*+] € L(B)\ Lx,,, for some z € N. Thus, the 
number of states in B is at least k+2. The automaton Ax, is also a minimal 
NFCA for languages Lx,,, if | > loz, hence it follows that Theorem 7 in 
[13] is also valid for cover automata: 


Theorem 1 There is a sequence of languages (Lx, )k>2 such that the non- 
deterministic cover complexity of Lx,,, is at least k, but the extended fooling 
set for Lx,,, 18 of size c, where c is a constant. 


Now, we are ready to check how hard is to obtain this minimal repre- 
sentation of a finite language. 


4 Minimization Complexity 


In this section we show that minimizing NFCAs is hard, and we'll show it 
with the exact same arguments from [14], used to prove that minimizing 
NFAs is hard. We will describe the construction from [12, 14], showing that 
we can also use it with only a minor addition for NFCAs. To keep the paper 
self contained, we include a complete description, and emphasize the changes 
required for the cover automata, rather than just presenting the differences. 

The idea of proving that NFCA minimization is NP-hard is the following: 
we take an arbitrary logical formula in conjunctive normal form F’, and we 
build two languages Dc and Lg such that their union Lr = Dg U Le should 
not include {a}* if F is satisfiable, in other words, {a}* is an l-cover language 
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for Lena", iff the formula is not satisfiable. Let us consider a logical formula 
m 


F € 3SAT, in the conjunctive normal form, i.e., F = \ C;, where each 
clause Cj, 1 <i <™m, is defined using variables 71,... Se C, =u, V u2V us, 
and each uj, 1 < 7 <3 are either x;, or —2;. Let pi, p2,...,Pn be distinct 
prime numbers such that p; < pp <...< pn. We set k = []j_, pi, and using 
Chinese Remainder Theorem [23]°, it follows that there exists a bijective 
function f : Z, —> []j_, Z»p,, such that f(x) = (~modpj,...,2mod pp). 
We need to define a language Dp and a natural number I, such that Dp = 
{a}*, if and only if F is unsatisfiable, therefore, the finite language Lp 1D“! 
has {a}* as a cover language. 

In a similar fashion as we built the automata Ax,, we can construct an 
automaton B; that recognizes the language L(B;) = {a" | n mod p; ¢ {0,1}} 
in O(pn) time. Let B be an automaton recognizing Lg = U;_, L(Bi). It 
is straightforward that it can be constructed in O(n- py) time. For each 
clause C;, such that a1, a2, a3 is an assignment of its variables for which C; is 
not satisfied, we define Lc, = N3_,{a" | nmod p; = a;}. An automaton C; 
accepting Lc, can be constructed in O(p?) time®. For every sequence s of 0s 
and 1s of length n, there is an unique number m € Z,, such that f(m) = s. 
In [14], the binary sequence s of length n is called representation, and its 
corresponding number m is called assignment. The range(f) may contain 
other sequences in [J;"_, Zp,, and using the above observation, we deduct 
that for the language Dp, we have 


Lp = {a' | i does not represent an assignment}, 
while for Lc, we have 
Lo = {a’ | f (4) does not satisfy F}. 


We set Lr = Lc U Lp, where Lo = Uj", Lc,. If F is satisfiable, then Lp is 
a cyclic language with period at most k, and the minimal period of Dp is 5, 
according to [10, 12]. Thus, setting 1 = k, we have that Lp {a}! has {a}* 
as a cover language, iff F is unsatisfiable. It follows that for some | € N, 
{a}* is an I-cover language for Lr N=S!, iff F is unsatisfiable. Please note 
that the construction is similar to the one in [14], but in our case, we also 
prove that the constant / exists, and the language constructed is an /-cover 
language. 


°Theorem 1.3.3, page 21 
®Using Cartesian product construction, for example. 
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Since according to [1] primality test can be done in polynomial time, 
we can find the first n prime numbers in polynomial time, meaning that our 
NFA construction can also be done in polynomial time. Hence, checking if 
{a}* is an I-cover language for Le M {a}*", is in NP. 

If F is unsatisfiable, then ncsc(L) = 1, otherwise the minimal number 
of states in an NFA is at least equal to the largest prime number dividing its 
period, p,. To prove that finding the minimal NFCA for Lp is NP-hard, we 
use the same argument as in [14]: the existence of a polynomial algorithm 
to decide if ncsc(L) = o(n) implies that nsc(L) = o(n), which implies that 
we can solve 3S AT in polynomial time, i.e., P = NP. This means that 
minimizing NFCAs is at least NP-hard. Consequently, we proved that: 


Theorem 2 Minimizing either NFCAs or |-NFCAs is at least NP-hard. 


In the next section we analyze some methods to reduce the number of 
states of a NFCA, because any minimization algorithm would be at least 
exponential. 


5 Reducing the Number of States of NFCAs 


Assume the DFA A = (Q,», 0, qo, F’) is minimal for L, and the minimal NFA 
is A’ = (Q’, =, 0’, q0, F), where Q’ = Q- {d}, 0’(s, p) = 6(s, p), if 6(s, p) € Q’ 
and 0/(s,p) = 0 if 6(s,p) = d. In other words, the minimal NFA is the same 
as the DFA, except that we delete the dead state. We may have a minimal 
DFCA as A, and A’ as a minimal NFA, but not as a minimal NFCA, as 
illustrated by Ax, and Lx, ,. 

We need to investigate if classical methods to reduce the number of 
states in an NFA or DFA/DFCA can also be applied to NFCAs, thus, we 
first analyze the state merging technique. For NFAs, we distinguish between 
two main ways of merging states: (1) a weak method, where two states are 
merged by simply collapsing one into the other and consolidate all their 
input and output transitions, [5], and (2), a strong method, where one state 
is merged into another one by redirecting its input transitions toward the 
other state, and completely deleting it and all its output transitions [9]. We 
must note that in case of NFCAs, the size of an NFA without mergeable 
states is bounded, as we cannot have a path from the initial state to the 
final ones that is longer than J. This observation contrasts with the result 
obtained in [9], for NFAs, where it is presented a class of arbitrary large 
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NFAs without any group of a fixed size k, of mergeable states. The same 
methods are considered for NFCAs. 


Definition 3 Let A = (Q,%,0,q0, F') be an NFCA for the finite language 
L. 


1. We say that the state q is weakly mergeable in state p if the automaton 
A'= (Q’, =, 6’, 90, F’), where Q’ =, Q -_ {q}, FB’ = Be) ag and 


6(s,a), if 6(s,a) CQ’ ands #p, 
d(s,a) = 4 (d(s,a) \ {a}) U {p}, ifq € 6(s,a) and s #p, 
(d(s,a) Ud(q,a))\ {a}, ifs=p 


is also an NFCA for L. In this case we write p X q. 


2. We say that the state q is strongly mergeable in state p, if the au- 
tomaton A’ = (Q’,»,6',qo, F’), where Q’ = Q - {gq}, F’ = FQ’, 


and 
‘Seay { (s,4), if 6(s,a) CQ! 
(6(s,a)\{q})U{p}, if q € 6(s,a), 


is also an NFCA for L. In this case we write p X q. 


In case p X q, (LELBULELEULELEULEL®) ES! C L and in case p X q, 
LELE OES! C (LELRULELE) ES! C L, where for s ¢ Q Lt = {we d* | 
s € 5(qo,w)} and L? = {w € d* | d(s,w) NF FO}. 

For the case of DFCAs, if A is a DFCA for L, and two states are 
similar with respect to the similarity relation induced by A, then all the 
words reaching these states are similar. Moreover, if two words of minimal 
length reach two distinct states in a DFCA, and the words are similar with 
respect to L, then the states in the DFCA must be similar with respect 
to the similarity relation induced by A. These results are used for DFCA 
minimization, and we need to verify if they can be used in case of NFCAs. 
In the following lemmata we show that the corresponding results are no 
longer true. 


Lemma 3 Let A = (Q,»,0,q,F) be an NFCA for the finite language L. 
It is possible that xa(s) ~, xa(q), but s and q are not mergeable. 


Proof: For the automaton in Figure 9, left, x 4(3) = x,4(1), but the states 
1 and 3 are not mergeable, as the resulting automaton would not reject a’. 
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Lemma 4 Let A = (Q,%,6,q0, F') be an NFCA for the finite language L, 
andp,q€Q,p#q. It is possible to have x,y € &*, p € 6(qo, x), g € 0(G0, y), 
p~q, andx 1 y. 


Proof: Consider the language L = L(A) {a,b}<™, where A is depicted 


in Figure 10. 
a 
| 
och a a “ 
KOO 5) 
a 


Figure 10: An example where p ~,4 gq, « #1 y, but p € 6(qo,x) and 
q € 6(qo, y), namely, aa %z ba, 2 € 6(0, ba), 7 € 6(0,aa), and 2 ~,4 7. 


We have that: 
e aa rz ba, because aaa ¢ L, but baa € L; 
e 2€ 6(0, ba), 7 € 6(0,aa), and 


e 2 ~4 7, because 5(2,a2*) = {2} C F, 6(2,a7*+1) = {11 N F =O, 
O70) = ce F, Ona) = feb ne = by and’ 32, 0) = 
6(7, w) = 0, for all w € ©* — {a}*. 


Let us verify the case when two states p,q are similar, or we can 
distinguish between them. 


Lemma 5 Let A = (Q,%,6,q0, F') be an NFCA for the finite language L, 
p,d€Q,p#4q, and either p,q € F, orp,q¢ F. Assume r € d(p,a) and 
s € 0(q,a). 


1. Ifr ~, 8s, for all possible choices of r and s, thenp~a q. 


2. The converse is false, i.e., we may haver “£4 s, for some r and s, and 
PYAY. 
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Proof: Assume p 4 q, and let w € NS/—maxtlevel(p),level(} A+, Because 
either p,q € F, or p,q ¢ F, we have that 6(p,aw) OF 40 and 6(q,aw)NF = 
0, or 6(p,aw) NF = @, and 6(q,aw)N F 4 0. If d(p,aw)N F 4 @ and 
6(q,aw)OF = 0, it follows that we have two states r € 6(p,a) and s € 6(q, a) 
such that d(r,w) NF £0, and 6(s,w) NF = 0. This proves that the first 
implication is true. 

For the second implication, consider the automaton depicted in Figure 10 
with / = 14, and the following states p,q,r,s: p=q=0,r=1, s=3, and 
the letter b. We have that p ~, q, 1,3 € 6(p,b) = 6(q,b) = 6(0,6), but 
r £4 8, because 6(1,a)N F = 9 and 6(3,a) DF = {4} 40. 

This result contrasts with the one for the deterministic case for cover 
automata, and the main reason is the non-determinism, not the fact that we 
work with cover languages. 

Next, we would like to verify if similar states can be merged in case 
of NFCAs, also to check which type of merge works. In case we have two 
similar states, we can strongly merge them as shown in Theorem 3. In the 
case of DFCAs, if two states are similar, these can be merged. We must 
ensure that the same result is also true for NFCAs, and the next theorem 
shows it. 


Theorem 3 Let A= (Q,™,6,qo, F') be an NFCA for L, and p,q € Q such 
that p #q, andp~,q. Then we have 


1. if level,(p) < level,(q), then p x q. 


2. It is possible that p Z q. 


Proof: — For the first part, let A’ be the automaton obtained from A by 
strongly merging g in p. We need to show that A’ is a NFCA for L. Let 
WwW =W1...Wpy be a word in SS!, n € N and w; € ¥ for all i, 1 <i<n. We 
now prove that w € L iff 6’(qo,w) NF’ £90. 

If we can find the states {qo,q@1,---,@n} such that q € 6(qo,w1), g2 € 
6(qi, Wa), --+5 In © O(Gn-1, Wn), In € F and g ¢ {0,%,---,Qn}, then qi € 
6’(qo, W1); G2 € 0'(q1, We), ---5 In © O'(Gn-1, Wn); Qn € F"7 ie., 6'(Go, w)NF” # 
0. Assume q = qj, and j is the smallest with this property. If 7 =n, then 
q € F, which implies p € F, then q € 6'(qo,w1), q2 © 5 (M1, W2), ---, 
dn € 6'(p, Wn), which means 6/(qo, w) NF” #0. 

Assume the statements hold for |w;...wn| < U for ’ <1 —|w| (1- 
|w1...wj| <U—level(q)), and consider the case when |w;—1w;...wn| =U’. If 
for every non-empty prefix of wj41...Wn, Wj-1---Wh, 7 ¢ O(p, Wj-1--- Wh); 
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then d(p, wj41...Wn) € F — {q} iff 6(¢, wj41...wn) € F— {gq}, ie., 
Op, wee pV AD ut OG, wit tn) OF FO. 

Otherwise, let h be the smallest number such that q € 0(q, wj+41-.. Wn). 
Then |wp4i..-Wn| < l’ (and p € 6'(p,w;...wp)). By induction hypoth- 
esis, 0'(p, Wn41---Wn) OF" F O iff 6(q, wagi-.-Wn) A F 4 9. Therefore, 
O(p, Wy4+1-+-WaWati--- Wn) OF" # ) iff 6(q, Wy+1-+-WrWati--- i, ie 0, 
proving the first part. For the second part, consider the automaton in 
Figure 11 as an NFCA for L = {a?,a*}. We have that 1 = 4 and 3 ~4 5, 
because level(3) = 3, and 6(3,e)N F = 6(5,e€) 1 F = 0 6(3,a)N F = {4}, 
6(5,a) 1 F = {6}. We cannot weakly merge state 3 with state 5, as we would 
recognize a? ¢ L. In Figure 12 we have the result for strongly merging state 
3 in state 5. 


HOO kg COL 
OOr0 nee 


Figure 11: Example for weakly merging failure and similar states. 


OL-O+@) 
Sel 


Figure 12: The result for strongly merging similar states for the example 
presented in Figure 11. 


We can observe that strongly merging states does not add words in the 
language, while weakly merging may add words. Because any DFCA is also 
an NFCA, then some smaller automata can be obtained from larger ones 
without using state merging technique, and the following lemma presents 
such a case. Also, the automaton in Figure 2 is obtained from automaton in 
Figure 1 by strongly merging states 0,...— m+ 1 into state —m. 
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Lemma 6 Let A = (Q,%,6,q0,F) be an NFCA for L, and consider the 
reduced sub-automaton generated by state p, A= (Qr,%,6r,p,F), i.e, Qr 
contains only reachable and useful states, and dp is the induced transition 
function. If 6(s,a) 1Qr =9, for all s € (Q\ Qr), we can find two regular 
languages Ly,L2 such that 


@ Ly = (L1U Lg) N dS! Herel?) | and 
e nsc(L1) + nsc(Le) < #QrR +1, 


then A is not minimal. 


Proof: Let A; = (Qi, 4, 6:, go, Fi), 1 = 1,2 be two NFAs for L; and 
D2, and Ly = (L1 U Le) N ysl-level(p)| We define the automaton B = 
((Q\Qr)U{p}UQ1UQa, », 6B, go, Fp) as follows: Fg = (F\Qr)UFL UF, in 
case p ¢ F, and Fg = (F\Qr)UF{UF2U{p} in case p € F. For the transition 
function, we have 6g(s,a) = 0(s,a) if s € (Q \ Qr), dB(s,a) = 6;(s,a) if 
s € Q;, i = 1,2, and 6p(p,a) = 41(¢0,1, a) U 62(g0,2, a) U d(p, a) \ Qr, if 
p ¢ d(p,a), and dp(p,a) = 61(q0,1, 4) U 62(go,2,a) U d(p,a) \ Qr U {p}, if 
p € 0(p,a). Obviously, the automaton B recognizes the cover language for 
L, and its state complexity is lower. 


This technique was used to produce the minimal NFCA for Lx, , in 
Figure 9. 


6 State Merging and Fooling Sets 


In this section we analyze the relation between mergeable states and fooling 
sets, more precisely, we would like to use a fooling set to identify states that 
are mergeable or not. We will consider both strong and weak mergeability. 
First, we start with a technical lemma. 


Lemma 7 Let S = {(xj,y;) | 1 <i<n} be a(n) (extended) fooling set for 
the finite language L. If i,j are such that1 <i,j7 <n,i14 j, then either 
eigy | Sy OF taue ele 


Proof: Assume |x;y;| > 1. It follows that |x;|+|yi| > 0, ie, |ys| > 0—|ajl. 
Because x;y; € L, |ayy;| < 1, ie., |ai| + |y;| < 1, hence |y;| < 1 — |x;|, thus 
| — |x;| < l— |a;|, which means that |x;| < |x;|. We have that |xjy;| = 
zi] + lug] < |agl + lygl St. 
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If a fooling set for a finite language DL has exactly the number of states 
in a given NFCA, then the NFCA is minimal. In case the NFCA has more 
states, it could still be minimal. However, we would like to investigate if it is 
not minimal, and some states might be either weakly, or strongly mergeable. 

The following result identifies, for a given fooling set, the states that 
are meargeable in an NFCA. 


Theorem 4 Let A= (Q,»,0,q0, F') be an NFCA for L, and S = {(2;, y:) | 
1<i<n} a fooling set for L. Let p,q € Q, p#q be two states, and i,j, 
l<i,j<n,i¥j, be such that p € d(qo,xi) and q € 6(qo,2;). Then the 
following statements hold true 


1. If d(qy)AF 40, and o(p,yi) OF AO, then p Z q. 
2. If d(p, yj) NF #0, and |x;y:| > 1 then p Z q. 


3. If |xiyj| > 1 and d(p,y:) OF £0, thenp Zq. 


Proof: Because S is a fooling set, it follows that xy; ¢ L and x;y; ¢ L. 
We now analyze each case of the theorem: 


1. Ifp X4q, it follows that by merging q with p, we obtain an equivalent 
NFCA B = (Q — {gq}, 4,68, 40, F — {q}) such that 0 4 d(q,y;)N FC 
6B(p,y;) 1 Fg. Because x;y; ¢ L, we must have |x;y;| > 1, which 
implies |x;y;| <1. {From q € 6(qo, xj), it follows p € dB(qo, xj). But 
O(p, yi) OF #Q, therefore 63(g0,2;yi) 1 F #9, ie., x;y € L, which is 
a contradiction. 


2. Ifp <4, it follows that by merging q with p, we obtain an equivalent 
NFCA B = (Q — {gq}, 5, 6B, qo, F — {q}) such that 0 4 d(p,y;)N FC 
6B(p, yj) FB. Because S is a fooling set and |x;y;| > 1, using Lemma 7 
we have that |ajyj| <1. But dg(q0, viyj) 1 F AO, therefore xy; € L, 
which is a contradiction. 


3. If p xq, it follows that by merging q into p, we obtain an equivalent 
NFCA B = (Q — {q}, 5,68, q, F — {q}) such that 0 4 6(q,y;) FC 
Op(p, yi) Fe. 

Because |x;y;| > 1, using Lemma 7, we have that |x;y;| <1. {From 
q € 6(qo, ;), it follows p € dp(q0, xj). But d(p, yi) NF #0, therefore 
6B(go, 274i) 1 F FO, ie., x;y; € L, which is a contradiction. 
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Because in the proof of 2 and 3 of Theorem 4 we use the condition on 
the length of the words in the fooling set, only one of the words x;y; or x;y; 
has to be tested if it is in L. Thus, if S is an extended fooling set, both 2 
and 3 of Theorem 4 will hold. 

For 1 of Theorem 4, and the case where S' is an extended fooling set, 
we must consider two cases: 


1. Liyj ¢ Db, and 


If x;y; ¢ L, then we have the same proof. If x;y; ¢ L, from 6(p, yi)OF A 
0, it follows that |x;y;| > 1, and using Lemma 7, we have that |x;y;| < 1. 
Because 63(q0, viyj) © 5(q, yj) \ {a} #9, it follows that ajy; € L, which is a 
contradiction. 

Hence, Theorem 4 holds for both fooling sets and extended fooling 
sets. The reverse is not true, as the fooling set technique does not provide a 
tight lower bound for the number of states. In the following examples, we 
show that if some initial conditions are not satisfied, then the states can be 
mergeable. 


Figure 13: Example of fooling set and mergeable states not satisfying the 
conditions in Theorem 4. 


Example 1 In the following example, Figure 13, we have an NFCA with 
n+5 states for the language Lr,,,, = {a, b}S™a{a, b}”~?, that is the same 
as the one in Figure 1. In case m = 2 andn= 4, the language is the same 
as the one described in Figure 3. A fooling set or extended fooling set can be 
S ={(a',ab”-!") |0 <i <n}, which guarantees that the minimal NFCA 
has at least n states. An equivalent minimal NFA has m+n states. 

We also have the following: 
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1. Ifp=0andq=V', thenp xX q. However, for the following values, i = 1 
and j = 2, we have that p € 6(—1,2;) and q € 6(—1,2;). In this case, 
ca; — 00; 4 ao ga and 0G. i a 


2. Forp=0 andq=2',i=1, andj =n-—1, we have x; =a, een iia 


y= ab" 11 y, =0a,052, and d(q,n)NF =9, but |aiy;| <1, and 
|x; yal >. 


a Por p= 0; ¢= 3.42 1end 7 = 4, we have 2 = 6, = aaaa, 
yaa ty Sab 0 sit OL Sand dy) k=O) 


Remark 1 For p = 0 and q = 2',i1 = 1, and j = 3, we have x; = a, 
t= aaa, yy; =a) 4g = ab 0s 2 and iGO Ho 


The examples presented, together with the Theorem 4, show the diffi- 
culty of finding mergeable states in an NFCA, even if we already know a 
fooling set. This suggests that expanding the study, even for the case of 
NFAs, may produce some useful practical results. 


7 Conclusion 


In this paper we showed that NFCAs are a more compact representation of 
finite languages than both NFAs and DFCAs, therefore it is a subject worth 
investigating. We presented a lower-bound technique for state complexity of 
NFCAs, and proved its limitations. We proved that minimizing NFCAs has 
at least the same level of difficulty as minimizing general NFAs, and that 
extra information about the maximum length of the words in the language 
does not help reducing the time complexity. We checked if some of the 
results involving reducing the size of automata for NFAs and DFCAs are 
still valid for NFCAs, and proved that most of them are no longer valid. 
However, the method of strong merging states still works in case of NFCAs, 
and we showed that there are also other methods that could be investigated. 

We also present an interesting connection between mergeability and 
fooling sets, that could be further extended. 

As future research, we list below some problems that we consider worth 
investigating: 


1. check if the bipartite graph lower-bound technique can be applied for 
NFCAs; 
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2. find bounds for non-deterministic cover state complexity; 


3. investigate the problem of magic numbers for NFCAs. In this case, we 
can relate either to DFCAs, or DFAs. 
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